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Abstract 

We construct pseudorandom generators that fool functions of halfspaces (threshold functions) 
under a very broad class of product distributions. This class includes not only familiar cases such 
as the uniform distribution on the discrete cube, the uniform distribution on the solid cube, and 
the multivariate Gaussian distribution, but also includes any product of discrete distributions 
with probabilities bounded away from 0. 

Our first main result shows that a recent pseudorandom generator construction of Meka 
and Zuckerman [MZ09] , when suitably modified, can fool arbitrary functions of d halfspaces 
under product distributions where each coordinate has bounded fourth moment. To e-fool 
any size-s, depth-d decision tree of halfspaces, our pseudorandom generator uses seed length 
0((d\og(ds/e) + logn) • log(ds/e)). For monotone functions of d halfspaces, the seed length can 
be improved to 0((d\og(d/e) +logn) • log(d/e)). We get better bounds for larger e; for example, 
to 1 /polylog(ra)-fool all monotone functions of (log n) / log log n halfspaces, our generator requires 
a seed of length just 0(log n). 

Our second main result generalizes the work of Diakonikolas et al. [DGJ + 09] to show that 
bounded independence suffices to fool functions of halfspaces under product distributions. As- 
suming each coordinate satisfies a certain stronger moment condition, we show that any function 
computable by a size-s, depth-d decision tree of halfspaces is e-fooled by 0(d 4 s 2 /e 2 )-wise inde- 
pendence. 

Our technical contributions include: a new multidimensional version of the classical Berry- 
Esseen theorem; a derandomization thereof; a generalization of Servedio jSer07] 's regularity 
lemma for halfspaces which works under any product distribution with bounded fourth moments; 
an extension of this regularity lemma to functions of many halfspaces; and, new analysis of the 
sandwiching polynomials technique of Bazzi |Baz09] for arbitrary product distributions. 
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1 Introduction 



Half spaces, or threshold functions, are a central class of Boolean- valued functions. A halfspace is a 
function h : W l — > {0, 1} of the form h(x\, . . . , x n ) = l[t«x^i + • • • + w n x n > 9] where the weights 
wi,...,w n and the threshold 9 are arbitrary real numbers. These functions have been studied ex- 
tensively in theoretical computer science, social choice theory, and machine learning. In computer 
science, they were first studied in the context of switching circuits; see for instance |Der651 lHu65, 
ILC671 IShe69l IMur71j . Halfspaces (with non- negative weights) have also been studied extensively 
in game theory and social choice theory as models for voting; see e.g. }Pen46l (Isb69, DS79, TZ92J. 
Halfspaces are also ubiquitous in machine learning contexts, playing a key role in many impor- 
tant algorithmic techniques, such as Perceptron , Support Vector Machine, Neural Networks, 
and AdaBoost. One of the outstanding open problems in circuit lower bounds is to find an ex- 
plicit function that cannot be computed by a depth two circuit ("neural network") of threshold 
gates [HM P+931 lKra9il IKW9fl lFKL+01j . 



In this work we investigate the problem of constructing explicit pseudorandom generators for 
functions of halfspaces. 

Definition 1.1. A function G : {0, 1} S —> B is a pseudorandom generator (PRG) with seed length 
s and error e for a class J- of functions from B to {0, 1} under distribution T> on B — or more 
succinctly, G e- fools T under T> with seed length s — if for all f £ T , 



Pr [f{X) = l]- Pr [f{G(Y)) = l] 



< e. 



Under the widely-believed complexity-theoretic assumption BPP = P, there must be a determin- 
istic algorithm that can approximate the fraction of satisfying assignments to any polynomial-size 
circuit of threshold gates. Finding such an algorithm even for simple functions of halfspaces has 
proven to be a difficult derandomization problem. Very recently, however, there has been a burst 
of progress on constructing PRGs for halfspaces [RS081 lDG.T+091 IMZ09] . The present paper makes 
progress on this problem in several different directions, as do several concurrent and independent 
works [HKM091 IDKN091 IBELY09) . 

This flurry of work on PRGs for functions of halfspaces has several motivations beyond its status 
as a fundamental derandomization task. For one, it can be seen as a natural geometric problem, 
with connections to deterministic integration; for instance, the problem of constructing PRGs for 
halfspaces under the uniform distribution on the n-dimensional sphere amounts to constructing a 
poly(n)-sized set that hits every spherical cap with roughly the right frequency |RS08| . Second, 
PRGs for halfspaces have applications in streaming algorithms [GR09], while PRGs for functions 
of halfspaces can be used to derandomize the Goemans- Williamson Max-Cut algorithm, algorithms 
for approximate counting, algorithms for dimension reduction and intractability results in compu- 
tational learning [KS08]. Finally, proving lower bounds for the class TC° of small depth threshold 
circuits is an outstanding open problem in circuit complexity. An explicit PRG for a class is easily 
seen to imply lower bounds against that class. Constructions of explicit PRGs might shed light on 
structural properties of threshold circuits and the lower bound problem. 



1.1 Previous work 

The work of Rabani and Shpilka [RS08J constructed a hitting set generator for halfspaces under 
the uniform distribution on the sphere. Diakonikolas et al. [D GJ + 09] constructed the first PRG 
for halfspaces over bits; i.e., the uniform distribution on {—1, l} n . They showed that any fe-wise 
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independent distribution e- fools halfspaces with respect to the uniform distribution for k = 0(l/e 2 ), 
giving PRGs with seed length (logn) • 0(l/e 2 ). 

Meka and Zuckerman constructed a pseudorandom generator that e-fools degree-d polynomial 
threshold functions ( "PTFs" , a generalization of halfspaces) over uniformly random bits with seed 
length (log n)/e°( rf ) [MZ09J. Their generator is a simplified version of Rabani and Shpilka's hitting 
set generator. In the case of halfspaces, they combine their generator with generators for small- 
width branching programs due to Nisan and Nisan-Zuckerman [Nis92, NZ96J to bring the seed 
length down to 0((logn) log(l/e)). This is the only previous or independent work where the seed 
length depends logarithmically on 1/e. 

1.2 Independent concurrent work 

Independently and concurrently, a number of other researchers have extended some of the afore- 
mentioned results, mostly to intersections of halfspaces and polynomial threshold functions over 
the hypercube or Gaussian space. 

Diakonikolas et al. [DKN09] showed that 0(l/e 9 ) -wise independence suffices to fool degree-2 
PTFs under the uniform distribution on the hypercube and under the Gaussian distribution. They 
also prove that poly(d, l/e)-wise independence suffices to fool intersections of d degree-2 PTFs in 
these settings. 

Harsha et al. [HKM09] obtain a PRG that fools intersections of d halfspaces under the Gaus- 
sian distribution with seed length 0((logn) • poly(logd, 1/e)). They obtain similar parameters for 
intersections of d "regular" halfspaces under the uniform distribution on {—1, l} n (a halfspace is 
regular if all of its coefficients have small magnitude compared to their sum of squares). 

Ben-Eliezer et al. [BELY09J showed that roughly exp((<i/e) rf )-wise independence e-fools degree-ci 
PTFs which depend on a small number of linear functions. 

1.3 Our Results 

In this work, we construct pseudorandom generators for arbitrary functions of halfspaces under (al- 
most) arbitrary product distributions. Our work diverges from previous work in making minimal 
assumptions about the distribution we are interested in, and in allowing general functions of halfs- 
paces. For both of our main results, we only assume that the distribution is a product distribution 
where each coordinate satisfies some mild conditions on its moments. These conditions include 
most distributions of interest, such as the Gaussian distribution, the uniform distribution on the 
hypercube, the uniform distribution on the solid cube, and discrete distributions with probabilities 
bounded away from 0. Our results can also be used to fool the uniform distribution on the sphere, 
even though it is not a product distribution. This allows us to derandomize the hardness result of 
Khot and Sakct [KS08J for learning intersections of halfspaces. 

We also allow for arbitrary functions of d halfspaces, although the seed length improves sig- 
nificantly if we consider monotone functions or small decision trees. In particular, we get strong 
results for intersections of halfspaces. 

1.3.1 The Meka- Zuckerman Generator 

We show that a suitable modification of the Meka-Zuckerman (MZ) generator can fool arbi- 
trary functions of d halfspaces under any product distribution, where the distribution on each 
coordinate has bounded fourth moments. More precisely, we consider product distributions on 
X = (a?i, . . . , x n ) where for every i £ [n], E[a?j] = 0, E[s 2 ] = 1, ~E[xf] < C where C > 1 is a 
parameter of the generator G. We say that the distribution X has C-bounded fourth moments. 
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We get our best results for monotone functions of d halfspaces, such as intersections of d halfs- 
paces. For distributions with polynomially bounded fourth moments, our modified MZ PRG fools 
the intersection of d halfspaces with polynomially small error using a seed of length 0(dlog 2 n). 
Many natural distributions have 0(l)-bounded fourth moments. Even for polylog(n)-bounded 
fourth moments, our PRG fools the intersection of (log n)/ log log n halfspaces with error l/polylog(n) 
using a seed of length just O(logn). Both of these cases are captured in the following theorem. 

Theorem 1.2. Let X be sampled from a product distribution on W 1 with C-bounded fourth mo- 
ments. The modified MZ generator e-fools any monotone function of d halfspaces with seed length 
0((dlog(Cd/e) + logn) \og{Cd/e)). When Cd/e > log _c n for any c > 0, the seed length becomes 
0{d\og{Cd/e) + logn). 

As a corollary, we get small seed length for functions of halfspaces that have small decision 
tree complexity. In the theorem below we could even take s to be the minimum of the number of 
0- leaves and 1-leaves. 

Theorem 1.3. Let X be as in Theorem \1.2i The modified MZ generator e-fools any sizes, depth-d 
function of halfspaces, using a seed of length 0((dlog(Cds/e) + logn) log(Cds/e)). When Cds/e > 
log _c n for any c > 0, the seed length becomes 0(dlog(Cds/e) + logn). 

Since the decision tree complexity is at most 2 d , we deduce the following. 

Corollary 1.4. Let X be as in theorem \1.2\ . The modified MZ generator e-fools any function of 
d halfspaces, using a seed of length 0{{d 2 + dlog(Cd/e) + log n)(d + \og(Cd/e))). When Cd2 d /e > 
log~ c n for any c > 0, the seed length becomes 0(d 2 + d\og{Cd/e) + logn). 

1.3.2 Bounded Independence fools functions of halfspaces 

We prove that under a large class of product distributions, bounded independence suffices to fool 
functions of d halfspaces. This significantly generalizes the result of Diakonikolas et al. (DGJ + 09| 
who proved that bounded independence fools halfspaces under the uniform distribution on {—1, l} n . 
The condition necessary on the product distributions is unfortunately somewhat technical; we state 
here a theorem that covers the main cases of interest: 

Theorem 1.5. Suppose f is computable as a sizes, depth-d function of halfspaces over the inde- 
pendent random variables x%, . . . ,x n . If we assume the Xj 's are discrete, then k-wise independence 
suffices to e-fool f , where 

k = d(d 4 s 2 1 e 2 ) • poly -(1/ a). 

Here < a < 1 is the least nonzero probability of any outcome for an Xj. Moreover, the same 
result holds with a = 1 for certain continuous random variables Xj, including Gaussians (possibly 
of different variance) and random variables which are uniform on (possibly different) intervals. 

For example, whenever a > l/polylog(d/e) it holds that 0(d 6 /e 2 )-wise independence suffices 
to e-fool intersections of m halfspaces. For random variables that do not satisfy the hypotheses of 
Theorem II .5| it may still be possible to extract a similar statement from our techniques. Roughly 
speaking, the essential requirement is that the random variables Xj be u (p, 2,p _c )-hypercontractive" 
for large values of p and some constant c < 1. 
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Notation: Throughout, all random variables take values in K or R . Random variables will 
be in boldface. Real scalars will be lower-case letters; real vectors will be upper-case letters. 
If X is a d-dimensional vector, we will write -X~[l], X\2], . . . ,X[d] for its coordinates values and 

II x \\ = \JYa=i x W for its Euclidean length. When M is a matrix, we also use the notation 
M[i, j] for its entry. If X is a vector-valued random variable, we write = E[||X|| p ] 1 / p . 

We typically use i to index dimensions and j to index sequences. Given x G R we define sgn(x) = 1 
if x > and sgn(x) = — 1 if x < 0. If X is a <i-dimensional vector, then sgA(X) denotes the vector 
in {-l,l} d with sgn(X)[i] = sgn(X[{\). 

Our results concern arbitrary functions of d halfspaces. Thus we have vectors W\, . . . , W n , O £ 
R d , and we're interested in functions / : {—1, l} d ->■ {0, 1} of the vector sg&(xiWi + . . .+x n W n — 6), 
which we abbreviate to sg&(W ■ X — 0) where W = (W±, . . . , W n ) and X = (x±, . . . , x n ). 

Organization: We give an overview of our results and their proofs in [2 We prove the multi- 
dimensional Berry-Esseen type theorems in Section [U In Section O we prove a regularity lemma 
for multiple halfspaces in the general setting of hypercontr active variables. We state modified MZ 
generator in Section [61 and analyze it using the machinery above in Section [71 In Section [8l we 
show how to combine it with PRGs for branching programs to get our Theorems 11.21 and 11.31 We 
prove Theorem 11.51 in Section [101 In Section [TT1 we show how our results apply to fooling the 
uniform distribution on the sphere, and use it to derandomize the hardness result of [KS08 . 

2 Overview of the main results 

In this section, we give an overview on how we construct and analyze the following two types 
of PRGs for functions of halfspaces under general product distributions: i) the modified Meka- 
Zuckerman generator (in Section 12. ip and ii) the bounded independence generator (in Section I2.2[) 

2.1 The Meka-Zuckerman Generator 

There are five steps in the analysis: 

1. Discretize the distribution X so that it is the product of discrete distributions whose moments 
nearly match those of X . 

2. Prove a multidimensional version of the classical Berry-Esseen theorem, and a derandomization 
thereof under general product distributions. This allows us to handle functions of regular 
halfspaces. See Subsection 12. l.H 

3. Generalize the regularity lemma/critical index lemma (see [Ser07l lDGJ + 09] ) to d halfspaces 
under general product distributions. This gives a small set of variables such that after condi- 
tioning on these variables, each halfspace becomes either regular or close to a constant function. 
See Subsection 12.1 .21 

4. Use the regularity lemma to reduce analyzing functions of d arbitrary halfspaces to analyzing 
functions of d (or fewer) regular halfspaces. 

5. Finally, generalize the monotone trick from [MZ09] . which previously worked only for a single 
"monotone" branching program, to monotone functions of monotone branching programs. This 
enables us to get seed length logarithmic in 1/e. See Subsection 12.1.31 
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2.1.1 Multi-Dimensional Berry-Esseen Theorem 



The classic Berry-Esseen Theorem is a quantitative version of the Central Limit Theorem. This 
theorem is essential in the analyses of [MZ09] and DGJ + 09 for one halfspace. Since we seek 
to fool functions of several halfspaces, we prove a multi-dimensional version of the Berry-Esseen 
theorem, which approximates the distribution of Y2i x iWi- The error of the approximation is small 
when all the halfspaces are regular (no coefficient is too large). While there are multi-dimensional 
versions known, we were unable to find in the literature any theorems which we could use in a 
"black-box" fashion. The reason for this is twofold: known results tend to focus on measuring the 
difference between probability distributions vis-a-vis convex sets; whereas, we are interested in more 
specialized sets, unions of orthants. Second, results in the literature tend to assume a nonsingular 
covariance matrix and/or have a dependence in the error bound on its least eigenvalue; whereas, 
we need to work with potentially singular covariance matrices. We believe this theorem could be 
of independent interest. 

Next we show how this theorem can be derandomized in a certain sense. This derandomization 
enables us to show that our modified MZ PRG fools regular halfspaces. 



2.1.2 Multi-Dimensional Critical Index 

The concept of critical index was introduced in the work of Servedio |Ser07] . It is used to 
prove a regularity lemma for halfspaces, which asserts that every halfspace contains a head con- 
sisting of constantly many variables, such that once these variables are set randomly, the re- 
sulting function is either close to constant, or close to a regular halfspace. This lemma has 
found numerous applications in complexity and learning theoretic questions related to halfspaces 
[Sern7l lOSMl IFGRW091 lbG.T + 09[ IMZ09j . 

The obvious generalization of the one-dimensional theorem to multiple halfspaces would be to 
take the union of the heads of each halfspace. This does not work, since setting variables in a 
regular halfspace can make it irregular. We prove a multidimensional version of this lemma, which 
moreover holds in the setting of product distributions with bounded fourth moments. Our analysis 
shows that the lemma only requires some basic concentration and anti-concentration properties, 
which are enjoyed by any random variable with bounded fourth moments. 



2.1.3 Monotone Branching Programs 

The only known method to get logarithmic dependence on 1/e for PRGs for halfspaces, due to Meka 
and Zuckerman, considers the natural branching program accepting a halfspace. This branching 
program is "monotone," in the sense that in every layer the set of accepting suffixes forms a 
total order under inclusion. Meka and Zuckerman showed that any monotone branching program 
of arbitrary width can be sandwiched between two small-width monotone branching programs. 
Therefore, PRGs for small-width branching programs, such as those by Nisan }Nis92] can be used. 

Since we deal with several halfspaces, we get several monotone branching programs. We consider 
monotone functions of monotone branching programs, to encompass intersections of halfspaces. 
However, such functions are not necessarily computable by monotone branching programs. Nev- 
ertheless, we show how to sandwich such functions between two small-width branching programs, 
and thus can use the PRGs like Nisan's. 
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2.2 Bounded Independence fools functions of halfspaces 

2.2.1 Sandwiching "polynomials" 

To prove that bounded independence can fool functions of halfspaces (Theorem II. 5p . we use the 
"sandwiching polynomials" method as introduced by Bazzi |Baz09j and used by |DGJ + 09] . However 
in our setting of general random variables it is not appropriate to use polynomials per se. The 
essence of the sandwiching polynomial method is showing that only groups of d random variables 
need to be "simultaneously controlled'. When the random variables are ±l-valued, controlling sub- 
functions of at most d random variables is equivalent to controlling polynomials of degree at most d. 
But for random variables with more than two outcomes, a function of d random variables requires 
degree higher than d in general, a price we should not be forced to pay. We instead introduce the 
following notions: 

Definition 2.1. Let Q = f^i x • • • x Q n be a product set. We say that p : £1 — x R is a A:-junta if 
f(x\, . . . ,x n ) depends on at most k of the Xj 's. We say that p is a generalized polynomial of order 
(at most) k if it is expressible as a sum of simple functions of order at most k. In the remainder 
of this section we typically drop the word "generalized" from "generalized polynomial", and add the 
modifier "ordinary" when referring to "ordinary polynomials". 

We now give the simple connection to fooling functions with bounded independence: 

Definition 2.2. Let X = (x±, . . . ,x n ) be a vector of independent random variables, where Xj has 
range fij. Let f : Q — x R, where = fii x • • • x tt n . We say that polynomials pi,p u '■ ^ — > R are 
e-sandwiching for f if 

Pl (X) < f(X) < p u {X) for all IgU, and B[p u (X)} - e < E[/(X)] < E[p,(X)] + e. 

Proposition 2.3. Suppose pi, p u are e-sandwiching for f as in Definition \2.2\ and have order at 
most k. Then f is e-fooled by k-wise independence. I.e., ifY = (y±, . . . ,y n ) is a vector of random 
variables such that each marginal of the form (yj 1 , . . . , yj k ) matches the corresponding marginal 

|E[/(X)]-E[/(r)]|<6. 
Proof. Write p u = Y^t Iti wri ere each qt is a /c-junta. Then 

E[/(F)] < E[Pu(Y)] = EEft(V)] = EEfe(Y)] = E%W] = E[p u (X)] < E[/(X)] +6, 

t t t 

where in addition to the sandwiching properties of p u we used the fact that qt is a fc-junta to deduce 
E[q t (Y)] = E[g t (X)]. We obtain the bound B[f(Y)} > B[f(bX)] - e similarly, using p h □ 

2.2.2 Upper polynomials for intersections suffice 

We begin with a trivial observation: 

Proposition 2.4. Let C be a class of functions $7 — > {0, 1}, and suppose that for every f G C we 
have just the "upper sandwiching polynomial" , p u , of an e-sandwiching pair for f . Then if C is 
closed under Boolean negation, we obtain a matching "lower polynomial" pi of the same order as 
p u automatically. 
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This is simply because given p u for /, we may take pi = 1 — p u - Since the Boolean negation of a 
halfspace is a halfspace, this observation could have been used for slight simplification in |DGJ + 09]. 

Our Theorem 11.51 is concerned with the class of 0-1 functions / computable as size-s, depth-ci 
functions of halfspaces. This class is closed under Boolean negation; hence it suffices for us to 
obtain upper sandwiching polynomials. Furthermore, every such / can be written as / = X^=i Ht, 
where s' < s and Ht is an intersection (AND) of up to d halfspaces. To see this, simply sum the 
indicator function for each root-to-leaf path in the decision tree (this again uses the fact that the 
negation of a halfspace is a halfspace). Thus if we have (e/s)-sandwiching upper polynomials of 
order k for each H t , by summing them we obtain an e-sandwiching upper polynomial for / of the 
same order. Hence to prove our main Theorem 11.51 it suffices to prove the following: 

Theorem 2.5. Suppose f is the intersection of d halfspaces hi, . . . , h d over the independent random 
variables X\, . . . , x n . Suppose a is as in Theorem \1.5[ Then there exists an e-sandwiching upper 
polynomial for f of order k < 0{d A /e 2 ) • poly(l/a). 



2.2.3 Polynomial construction techniques 

Suppose for simplicity we are only concerned with the intersection / of d halfspaces hx,...,h d over 
uniform random ±1 bits Xj. The work of Diakonikolas et al. DGJ + 09] implies that there are is an 
eo-sandwiching upper polynomial pi of order 0(1/eq) for each h{. To obtain an e-sandwiching upper 
polynomial for the intersection h\hi - ■ - h^ a natural first idea is simply to try p = P1P2 • • -Pd- This 
is certainly an upper-bounding polynomial; however the e-sandwiching aspect is unclear. We can 
begin the analysis as follows. Let hi = hi(X) and pi = pi(X). By telescoping, 

E[pi ■■■Pd]- E[/n • • • h d ] = E[(pi - hi)p 2 ■■■Pd] + • • • 

... + E[/ii • • -hi-i(pi - hi)p i+ i ■ ■ -Pd] + ■■■ (1) 
... + E[/n---/i d _i(p d -/i d )]. 

Now the last term here could be upper-bounded as 

E[hi ■ ■ ■ h d _ x {p d - h d )\ < E[p d - h d ] < e , 

since each < hi < 1 with probability 1. But we cannot make an analogous bound for the 
remaining terms because we have no a priori control over the values of the p^s beyond the individual 
sandwiching inequalities 

E[pi - hi] < e . 

Nevertheless, we will be able to make this strategy work by establishing additional boundedness 
conditions on the polynomials pi\ specifically, that each pi exceeds 1 + 1/d 2 extremely rarely, and 
that even the high 2<i-norm of Pi is not much more than 1. 



Establishing these extra properties requires significant reworking the construction in |DGJ + 09 



Even in the case of uniform random ±1 bits, the calculations are not straightforward, since the 
upper sandwiching polynomials implied by [DGJ + 09 are only fully explicit in the case of regular 



halfspaces. And to handle general random variables Xj, we need more than just our new Regularity 
Lemma 15.31 for halfspaces. We also need to assume a stronger hypercontractivity property of the 
random variables to ensure they have rapidly decaying tails. 
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3 Hypercontractivity 



The notion of of hypercontractive random variables was introduced in |KS88] and developed by 
Krakowiak, Kwapien, and Szulga: 

Definition 3.1. We say that a real random variable x is (p, q, 7/)-hypercontractive for 1 < q < 
p < oo and < r] < 1 if \\x\\ p < oo, and for all a£i, \\a + r]x\\ p < \\a + x\\ q . 

In this paper we will be almost exclusively concerned with the simplest case, p = 4, q = 2. Let 
us abbreviate the definition in this case (and also exclude constantly-0 random variables): 

Definition 3.2. A real random variable x is i]-HC for 0<i/<1j/0< II^IU < 00 an< ^ f or a ^ 
a g R, || a + tjx\\ 4 <\\a + x\\ 2 , i.e. E[(a + rjx) 4 ] < E[(a + x) 2 } 2 . 

Essentially, a mean real random variable is ry-HC with large n if and only if it has a small 4th 
moment (compared to its 2nd moment). Random variables with small 4th moment are known to 
enjoy some basic concentration and anti-concentration properties. We work with hypercontractivity 
rather than 4 th moments because it tends to slightly shorten proofs and improve constants; the 
main convenience is that a linear combination of ??-HC random variables is also rj-HC. 

Here we list some basic and useful properties of rj-HC random variables, all of which have 
elementary proofs. Note that Facts [3] and H] imply that the upper bound on the 4th norm C = 
9(1/^). 

Fact 3.3. JKSM \MOO0,% \Wol06a\ I WolOBb^ 

1. If x is rj-HC then it is also rj-HC for all rj < rj. 

2. If x is rj-HC then x is centered, E[tc] = 0. 

3. Ifx is rj-HC then E[a; 4 ] < (l/??) 4 B[x 2 } 2 . 

4- Conversely, if ~E[x] = and F,[x 4 ] < (l/i]) 4 F,[x 2 ] 2 , then x is (rj/2^/3)-HC. Ifx is also sym- 
metric (i.e., —x has the same distribution as x) then X is min(?7, l/y3)-HC. 

5. Ifx is ±1 with probability 1/2 each, then x is (1/V3)-HC. The same is true ifx has the standard 
Gaussian distribution or the uniform distribution on [—1,1]. 

6. If x is "q-HC then in fact r\ < l/y/3. 

7. If x is a centered discrete random variable and a < 1/2 is the least nonzero value of x 's 
probability mass function, then x is rj-HC for r\ = a 1//4 /2\/3. 

8. If x%, . . . , x n are independent rj-HC random variables, then so is C\X\ + • • • c n x n for any real 
constants ci, ... ,c n , not all 0. (Indeed, 4-wise independence suffices.) 

9. If x is r/-HC, and y is a random variable with the same rth moments as x for all r = 0, 1, 2, 3, 4, 
then y is also ij-HC 

The notion of hypercontractivity can be extended to R^-valued random variables: 

Definition 3.4. An M. d -random variable X is 77-HC for < rj < 1 if ||^||4 < 00 and for all 
A<=R d ,\\A + riX\\ 4 <\\A + X\\ 2 . 

We require the following facts about vector-valued hypercontractivity: 

Fact 3.5. \Szu90j 

1. IfW<E M. d is a fixed vector and x is an r\-HC real random variable, then X = xW is an rj-HC. 

2. If Xi, . . . , X n are independent rj-HC random vectors, then so is c\X\ + • • • c n X n for any real 
constants ci, . . . ,c n . (Again, 4-wise independence also suffices.) 
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Hypercontractive real random variables possess the following good concentration and anti- 
concentration properties. 

Proposition 3.6. If x is i]-HC then for all t > 0, Pr[|s| > iH^lh] < ^rjr- 

Proof. Apply Markov to the event "a: 4 > t 4 E[a: 2 ] 2 ". □ 

Proposition 3.7. If x is n-HC then for all 9 G R and < t < 1, Pr[|a? - 9\ > t\\x\\ 2 ] > n 4 (l-t 2 ) 2 . 

Proof. By scaling x it suffices to consider the case \\x\\2 = 1. Consider the random variable 
y = (x — 9) 2 . We have 



B[y] = F,[x 2 ]-29E[x} + 9 2 



1 + 



B[y 2 ] = rf 4 E[(-r]9 + r/cc) ] < rf 4 E[(-rj9 + x) 2 ] 2 = r?~ 4 (l + n 



n2\2 



where we used the fact that x is r/-HC in the second calculation (and then used the first calculation 
again). We now apply the Paley-Zygmund inequality (with parameter < t 2 /(l + 9 2 ) < 1): 



Pr[|x -9\>t] = Pr[y > t 2 



Pr 



y > 



i + 



E[y] 



> 1 



1 + 



> 1 



1 + 9 2 J ( v - 2 + 9 2 ) 2 



2 nv? 

E[y2] 

f] 2 (l -t 2 ) + n 

1 + 7] 2 9 2 



2z,2 X 2 



• (2) 



Treat r\ and t as fixed and 9 as varying. Writing u = ry 2 (l — t 2 ), we have < u < 1; hence the 
fraction (it + rj 2 9 2 )/(l -\-n 2 9 2 ) appearing in ([2]) is positive and increasing as n 2 9 2 increases. Thus it 
is minimized when 9 = 0; substituting this into ([2]) gives the claimed lower bound. □ 



4 The Multi-Dimensional Berry-Esseen Theorem 

In this section we prove a Berry-Esseen-style results in the setting of multidimensional random 
variables, and a derandomization of it. 

We assume the following setup: X\, . . . , X n are independent Revalued ry-HC random variables, 
not necessarily identically distributed, satisfying E[Xj] = for all j G [n]. We let S = X\ + - ■ -+X n . 
We write Mj = Cov[Xj] G R rfxrf for the covariance matrix of Xj, which is positive semidefinite. 
We also write M = Cov[S^] for the covariance matrix of S; by the independence and mean-zero 
assumptions we have M = M\ + • • • + M n . We will also assume that 

n 

M[i, i] = Yj e l x 3 W 2 ] = 1 for a11 i G [d]- 
j=i 

If we write a 2 = \\Xj\\ 2 , it follows that S?=i °j = d. We introduce new independent random 
variables Gi, . . . , G n , where Gj is a (f-dimensional Gaussian random variable with covariance matrix 
Mj; we also write also G = G\ + • • • + G n . We say that A C R d is a translate of a union of orthants 
if there exists some vector G R rf such that X G A depends only on sgA(X — 0). 

Theorem 4.1. Let S and G be as above. Let A C R rf be a translate of a union of orthants. Then 
|Pr[5 G A] — Pr[G G A]\ < 0(rT 1/2 d 13/8 ) ■ af)^ . 
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We now show that this result can be "derandomized" using the output of the MZ generator Y 
in place of X. We describe here a simplified version of the output of their generator. 



Definition 4.2. A family H = {h : [n] — > [t]} of hash functions is 6-collision preserving if 



Efficient constructions of size \H\ = 0(nt) are known for any constant 6 > 1. b = 1 is optimal, 
and can be achieved by a pairwise independent family. In our construction we use 6=1, but we 
will need larger 6 in our analysis. A hash function induces a partition of [n]. 

We choose a partition Hi , . . . , Ht of [n] into t buckets using a 6-collision preserving family of 
hash functions (where 6 < 2). The vector of variables {Yj}j e fj e is generated 4-wise independently. 
There is full independence across different buckets. Let T = Y\ H + Y n . 

Theorem 4.3. Let T and G be as above. Let A C M. d be a translate of a union of orthants. Then 



Putting these two theorems together, we have shown the following statement 
Theorem 4.4. Let S and T be as above. Let A QM. d be a translate of a union of orthants. Then 



In the rest of the section, we prove above theorems; our aim is not to get the best bounds possible 
(for which one might pursue the methods of Bentkus [Ben04]). Rather, we aim to provide a simple 
method which achieves a reasonable bound, and thus use the Lindeberg method, following [MOO05, 
IMos08| very closely. 

4.1 The basic lemma 

In what follows, K will denote a d-dimensional multi-index (fei, . . . , kj) E N rf , with \K\ denoting 
ji + ■ ■ ■ + jd and K\ denoting fci!^! • • • kj).. Given a vector H E M. d , the expression H K denotes 
nf =1 H[i] k '. Given a function ip : R d -)■ R, the expression ^ K ~> denotes the mixed partial derivative 
taken ki times in the ith coordinate; we will always assume if) is smooth enough that the order of 
the derivatives does not matter. 

The following lemma is essentially proven in, e.g., [Mos08[ Theorem 4.1]. To obtain it, simply 
repeat Mossel's proof in the degree 1 case, until equation (31). (Although Mossel assumes that 
the covariance matrices Mj are identity matrices, this is not actually necessary; it suffices that 
Cov[Xj] = Cov[Gj].) Then instead of using hypercontractivity, skip directly to summing the 
error terms over all coordinates. 



1. For all i£[n],£e[t], Pr heu n[h(i) = I] < b/t. 



2. For all i ^ j E [n], Pr hEu n[h(i) = h(j)} < b/t. 





Lemma 4.5. Let ip : R d 



K. be a C 3 function with \ip( K ' \ < b for all \K\ = 3. Then 



E[iP(S)} 



w<»E ^E( E 0^fO+E[|Gf|]). 



(3) 



|ff|=3 ' i=i 



We further deduce: 
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Corollary 4.6. In the setting of Lemma \4.5\ 



|E[^(S)]-E^(G)]|<26d 3 ^||X i || 

3 = 1 



Proof. Fix a multi-index K with \K\ = 3 and also an index j. We will show that 

E[|Xf|]+E[|Gf|]<2.6||X,||I. 

Substituting this into ([3]) completes the proof, since 

, 2.6 , ,q 

b E io ^ 2bd - 

|Jf|=3 

Let the nonzero coordinates in X be ii, i 2 , «3 G [d], written with multiplicity. Write also 

a* = M j [i,i]=E[G j [i\ 2 ]=E[X j [i\ 2 ]. 
On one hand, by Holder we have 



(4) 



E[\Gf\] = E[\G j [i 1 ]G j [i 2 }G j [i 3 }\] < {/E {G^h}] 



E 



\GiM 



E 



\G,,M 



Note that the distribution of C?j[ii] is N(Q,af ). It is elementary that such a random variable has 
third absolute moment equal to ■ af < 2.6a?. As the same is true for i 2 and i% , we conclude 

that 

E[\G?\) < 1.6a h a i2( n 3 . (5) 
On the other hand, we can similarly upper-bound 



E[|xf|] <{/E 



E 



IX., 



E 



|X, 



(6) 



But 



E 



\X~\ix] 



E 



\Xj[i 2 ] 



E 



IX, 



> A E 



3/2 



E 



IX., 



3/2 



E 



IX, 



3/2 



and hence from (El) and (El) we conclude 



E [\Xf\] +E [\Gf |] < 2.6{/E (X^n 



E 



IX, 



E 



ix^r 



Finally, we clearly have |Xj[ii]| < ||X,-|| always, and similarly for j 2 , js- Hence 



E [\Xf\] +E[\Gf\] < 2.6^E 
confirming (JU). 

Corollary 4.7. In the setting of Lemma \4.5\ 



IX,- 



E 



IX,- 



E 



2.6||X,|| 



7 II I' 



□ 



|E[^(5)] -E[^(G)]| < 2bd 7 /\ 



Era 



ii 



Proof. Using Cauchy-Schwarz twice, 



n n n n 

£iix,ui = £ E = s> [ra nx,f] < £ 



E 



IX,- 



E 



< 



E E [ra 2 ] 

3=1 



V 



■ft, 



X,- 



IX, 



where we also used £ a j = d. 



□ 



4.2 Derandomization and hypercontractivity 

We now show that this result can be "derandomized" in a certain sense. This idea is essentially 
due to Meka and Zuckerman [MZ09, Sec. 4.1]. 

Definition 4.8. We say that the sequences of W 1 -valued random vectors Xi, . . . , X n and Yi, . . . ,Y n 
satisfy the r-matching- moments condition, r G N, if the following holds: Hj[X k ] = E[y K ] for 
all multi-indices \K\ < r, where X is the M. dn -valued random vector gotten by concatenating 
Xi, . . . , X n , and y is defined similarly. 

In this section, we suppose that Y\,,.,,Y n satisfy the 4-matching-moments condition with 
respect to Xi, . . . , X n . We will not suppose that they are independent, but rather that they have 
some limited independence. Let T = Y\ -| + Y n . 

Proposition 4.9. Let Hi, . . . ,Ht form a partition of [n], and write Zg = £ Jg # £ Yj. Assume that 
Zi , . . . , Zt are independent. Then 



|E[^(T)]-E[^(G)]| < 2bd 7 ' 2 t 



v Ell T. x.llt 

\ 1=1 3&*t 



Proof. We simply apply Corollary 14.71 to the random variables Zi,...,Zt. To check that it is 
applicable, we note the following: The random variables are independent. They satisfy E[^] = 0, 
because each E[lj] = by 1-matching- moments. The covariance matrix £^ =1 Cav[Zg] = M, by 
2-matching- moments. 

Thus Corollary 14.71 gives 



lEf 



n^{G)\\<2bd 7 i\lj:\\z,\\i 

i=\ 



But for each I. 



\Ze\\i 



I £ 15111 



E ( £ Yj, £ Yj) 2 
using 4-matching-moments, completing the proof. 



E 



( £ x„ E x 3 ) 



I £ x.-lH, 



□ 



Remark 4.10. The full 4-matching-moments condition is not essential for our results; it would 
suffice to have 2-matching-moments, along with a good upper bound on the 4th moments of the Yj 's 
with respect to those of the Xj 's. 
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We can simplify the previous bounds if we assume hypercontractivity. 

Corollary 4.11. If we additionally assume that the random vectors Xi, . . . ,X n are rj-HC, then 
we have 



W(S)]-E[^(G)]| < (2M 7 / 2 A? 2 )j£^, 



|E[^(T)]-E[^(G)]| < {2bai^/rf) t(Y. ■ 
Proof. We prove only the second statement, the first being simpler. It suffices to show 



E x A\l< 0-hf( E ^ 



Since the random variables {Xj : j S Hf} are independent and ry-HC, it follows that the (vector- 
valued) random variable Eje-ff,* ^3 * s ^"HC. Hence 



But 



by the Pythagorean Theorem. 



I e < (W II E • 



E XjWl = E 



□ 



We now consider the case when the partition H\ , . . . , Ht chosen randomly using a 6-collison 
preserving family of hash functions (see Definition I4.2p . 



Proposition 4.12. In the setting of Corollary \4-ll\ if the partition Hi, . . . , H is chosen using a 
b-collision preserving family of hash functions, then 



|E[V>(T)] - E[V(G)]| < (266 1 / 2 d 7 / 2 /?? 2 



/d 2 



where the expectation E[^(T)] is toit/i respect to both the choice of Hi, . . . , Ht and Yi, . . . ,Y n . 
Proof. By the triangle inequality for real numbers, it suffices to show 



E 

Hi,...,Ht 



e( e ^ 2 



=1 ^jeHt 

By Cauchy-Schwarz, this reduces to showing 



3 



< 



i 



E 

H u ...,H t 



e(e oj 



ci 2 n 
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But 



E 

Hi,... ,H t 



E ( E of 

1=1 K jeH e 



™ „\ 2 

E 1 {jeH i }<y\ 

3=1 



t n 



£=131,32=1 



t I , n 




^E + E ^AEmneH^en^ <b±of+\ E 44 < x +6 E4 

<=1 V J=l / ii^J 2 1=1 3=1 3\+32 3=1 

as needed, because 



E44< E- 



31+32 



k3 = 1 



□ 



4.3 Smoothing 

Ideally we would like to use the results from the previous sections with ifj equal to certain indicator 
functions x '■ ~ * {0) 1}> however these are not C 3 . As usual in the Lindeberg method (see, 
e.g., [MOO05]), we overcome this by working with mollified versions of these functions. For most of 
this section, we will work with our underandomized result, the statement about S in Corollary 14. 71 
Identical considerations apply to the statement about T in Proposition 14.121 an d we will draw the 
necessary conclusions at the end. 



Let £ : R — > R be the "standard mollifier", a smooth density function supported on [—1,1]. We 
will use the fact that there is some universal constant &o such that J \ ^ k ^\ dx < bo for k = 1,2,3 
(where £( fc ) denotes the feth derivative of £). Given e > we define £ 6 (x) = £(x/e)/e, the standard 
mollifier with support [— e, e]. Finally, define the density function H e on R d by H e (xi, . . . ,Xd) = 
Y\i=i£t( x i)- We now prove an elementary lemma: 



Lemma 4.13. Let x '■ R d - > [— 1,1] be measurable, let e > 0, and define tp 
function. Then for any multi-index \K\ =3 we have \t/)( K >\ < (bo/e) s . 



* X, a smooth 



Proof. Using the fact that \x\ < 1 everywhere, we have 



< 



7(K) 



i=l 



dx-i 



Note that ^ie{x) = ^ k \x / e) / e k+l , from which it follows that 





Qk 




dx^ {x) 



for k = 1, 2, 3. For k = we of course have 

|£ e (z)| dx 



dx < 6 /e 



£e(x) dx = 1. 



dx r . 



n 

1=1 



dx k < 



dx. 



Since =3, we therefore achieve the claimed upper bound of (bo/e) 



□ 
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Suppose now A C M. d is a measurable set. We define: 

A +e = {xeM d : x+[-e/2, e/2} d DA / 0}, A~ e = {xeM d : x+[-e/2, e/2] d C A}, D € A = A +e \A- e . 

We also define ipA+t = S e * XA+ e as i n Lemma 14.131 where XA+ e ls the 0-1 indicator of A +e , and 
similarly define ipA~ e - Applying now Corollary 14.71 we conclude: 

Lemma 4.14. For if) = ipA+ e or ip = ipA~ c it holds that 

m{S)\ -E[^(G)]| < (2b d 7 / 2 / V 2 e 3 )^t^. 

It is clear from the definitions that both ipA+ e an d ^a-e have range [0, 1], and that pointwise, 

IpA-- <XA< i>A+t ■ Thus 

E[^ A -.(5)] < Pr[5 G A] < E[^ A+e (S)], 

E[^-e(G)] < Pr[G £i]< E[^ A+ .(G)]. 

From Lemma [4.14l we have that the two left-hand sides above are close and that the two right-hand 
sides are close. Because of good anti-concentration of Gaussians, it may also be that the left-hand 
and right-hand sides on the second line are also close, in which Pr[S G ^4] and Pr[G G A] will also 
be close. This motivates the following observation: ipA+ e = ^A-^ = 1 on A~ e and ipA+ c = ^A~ e = 
on the complement of A +e . Hence 

Eh/> A+e (G)] - E[^-e(G)] < Pr[G G D e A]. 

Putting together these observations, we conclude: 

Theorem 4.15. We have 

\Pr[S G A] - Pr[G G A}\ < (2b d 7/2 /r] 2 e 3 )^pVj + Pr[G G 3 e A]. 
4.4 Translates of unions of orthants 

Let us now specialize to the case where A C M. d is a translate of a union of orthants. Recall that 
this means that there exists some vector G M d such that X G A depends only on sgA(X — 0). 
We make the following observation, whose proof is trivial. 

Proposition 4.16. If A CI M. d is a union of orthants then 

d 

^A C (J W{, 

i=l 

where 

W^ = {XeR d :\X[j}-0[j]\<e/2}. 
But we also have the following: 
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Proposition 4.17. Assuming the d- dimensional Gaussian G with covariance matrix M satisfies 
M[i,i] = 1 for all i E [d], it holds that 



Pr 



Ge[jW( 



i=i 



< de/V2ir. 



Proof. By a union bound it suffices to prove that Pr[|G[i] — Q[i]\ < e/2] < e/v27r. This is straight- 
forward, as 6r[i] has distribution ./V(0, 1) and hence has pdf bounded above by l/y/2/ir. □ 

We now prove Theorem 14.11 

Proof. (Theorem 14. ip For any e > 0, we may combine Propositions 14.161 and 14.171 with Theo- 
rem \4A5\ and conclude 



/ n 

\Pr[S € A] — Pr[G G A}\ < (2b d 2 /n 2 e 3 ) J £ a] + de/v^. 



The proof is completed by taking e = n 1//2 rf 5//8 (^j=i a j) 1 ^ 8 (which is strictly positive since ^ a j = 
is impossible). □ 

Identical reasoning gives the proof of Theorem 14.31 Combining Theorems 14.11 and 14.31 gives 
Theorem O 



5 Critical Index for Hypercontractive Random Variables 

In this section, we generalize the critical index to random variables that are hypercontractive. We 
will consider ?7-HC random variables xq, . . . ,x n which are at least pairwise independent. Write 
cr 2 = || Xj || 2 , and note that pairwise independence implies \\xq + ■ ■ ■ + asn||! = cr 2 , + • • • + a 2 . We 
also write if = \\xi + x i+x H h x n \\l = J2j>i a j- 

Definition 5.1. For < 5 < 1, we say that the collection of random variables Xq, . . . ,x n is 
5-regular */£? =0 INIt < *(E?=o Mlf = ^. 

Definition 5.2. Suppose the sequence xq, . . . ,x n is ordered, meaning that a 2 > o\ > erf > ■ ■ ■ . 
Then for < 5 < 1, the ^-critical index is defined to be the smallest index I such that the sequence 
X£, X£+i, . . . , x n is 5-regular, or £ = oo no such index exists. 

Theorem 5.3. Let < S < 1, < e < 1/2, and s > 1 be parameters. Let L = br, where 
b= [(2/r ? 4 )ln(l/e)l and r = \(l/r] 4 5) ln(l + 16s 2 )]; note that 

L <Q / log(s)lQg(l/€) \ 1 

V V 8 J S' 

Assume the sequence xq,...,x h is ordered, that n > L, and that xo, . . . , xl-% are independent. 
Then if £ is the 5-critical index for the sequence, and £> L, then for all € R, 

Pr[\x + ... + x L ^-0\<s.r L ]<e + ^pi. 
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Proof. For any < j < L, since the critical index i is at least j we have 

5 i < E ^ (V'? 4 ) E ^(smce each a., is r?-HC) < (<rf /r/ 4 ) E CT i = (^VJ- 

i>j i>j i>j 

where we used hypercontractivity and the fact that cr^s are ordered. Hence for all < j < L, 

^ 6t ! < °i = T j ~ r i+i r i+i < i 1 ~ J 7* 5 ) 7 i • 

It follows that for all < k < b, 

T 2 {k+1)r < (1 - ?? 4 <5) r r fc 2 r < l+ \ 6s 2 T kn ( 7 ) 
where we used the definition of r. 

~ z (k+l)r- 



Now for each < k < b define y k = x kr + x kr+ i + x kr+2 H + a3( fe+1 ) r _ 1 and v\ = \\yk\\l 



T kr ~ T (fc+i) r - Using ([7]) we have immediately conclude 

v\ > 16s 2 r ( 2 fc+1)r =>■ Ufc > 4sr (fc+1)r . (8) 

Since all of So, • • • , £C£-i are independent and 77-HC, we have that yo, y\, . . . , are independent 
?7-HC random variables. For < k < 6, define the event A k = il \yo + yi + • • • + y k — 6\ < (l/2)v k \. 
We claim that for any < k < b, 

Pr[A k I A A A x A • • • A A fc _i] < 1 - t? 4 /2- 

To see this, note that conditioning only affects the values of random variables yo, . . . , Uk—i, of which 
y k is independent. Further, for every choice of values for yo, . . . ,yk-i, the event A k is an anti- 
concentration event of the type in Proposition 13. 7\ with some shifted 6. Hence the claim follows 
from this Proposition, as (1 — (1/2) 2 ) 2 > 1/2. Having established the claim, we conclude 

Pr[A A A l A • • • A A b ^] < (1 - rf /2) b < e. (9) 

Let us now define, for each 1 < k < b, random variables z k = y k + yt+i + • • • + y&— 1> These 
random variables are also 77-HC, and they satisfy H^fclll — T kr- ^ we define the events B k = "\z k \ > 
ST kr " , then Proposition 13.61 implies Pr[i?fc] < l/r/ 4 s 4 . Hence 

Pr [B x V B 2 V • • • V B b _ x ] < (b - 1) /r/ 4 s 4 < &/r?V. (10) 

Combining Q and (llOh we see that except with probability less than e + 6/r/ 4 s 4 < e + °^ n i 1 / £ ^ ; 

v s 

at least one event A k occurs, and none of the events B k occurs. Since this is the error bound in the 
Theorem, it remains to show that in this case, the desired result "\xo + • • • + — 9\ > s ■ tl" 
occurs. Assume then that A m occurs and B m+ i does not occur, < m < b. (For m = b — 1 we 
need not make the latter assumption.) Thus 

\yo + Vi H VVm - 0\ > (1/2) v m and |z m _|_i| < ST( m +x)r < (l/4)f m , 

where we used ([8]). (This makes sense also in the case m = b — 1 if we naturally define Zf, = 0.) By 
definition of z m+ x, we therefore obtain 

\yo + Vi H h Vb-i -9\ = \x -\ h x L -x ~0\> (l/4)t» m > (l/4)t» 6 _i > sT br = st l , 

as desired, where we used ([8]). □ 
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We now state the high-dimensional generalization of Theorem 15.31 Assume xi, . . . ,x n are 77-HC 
real random variables which are at least pairwise independent. Assume also that W%, . . . , W n are 
arbitrary fixed vectors in M d , and write X~ 



Xj Wj. 



Theorem 5.4. Let 6,e,s,L be as in Theorem \5.3l Then there exists a set of coordinates Hq C [n], 
\Hq\ < dL, with the following property. Assuming the collection of random variables {xj : j € Hq} 
is independent, for each coordinate i E [d] we have either: 

1. the sequence of real random variables {Xj[i] : j Hq} is 5-regular; or, 

2. for all 



Pr 



j<ZH 



< s 



< e + 



0(ln(l/e)) 



The fact that the sequence xq, . . . ,x n was ordered by decreasing 2-norm in Theorem 15.31 was 
mainly used for notational convenience. We can extract from the proof the following corollary for 
unordered sequences (whose proof we omit): 

Corollary 5.5. Let S,e,s,b,r,L be as in Theorem \5. 31 For the unordered collection xq, . . . , x n , 
assume we have a sequence of indices < jo < ji < • • • < jz-i < n such that: 

• for each < t < L, cr? > a 1 -, for all j' > j t ; 

• for each < t < L, {xj t , Xj t+ i, . . . , x n } is not 5-regular. 
Assume also that xq, . . . , Xj L are independent. Then for all 9£t, 

« n .1 i 0(ln(l/e)) 

Pr [|a:o + • • • + x jh _ x -9\<s- r jL _ 1+l ] < e + J ' . 

The case when jt = t for < t < L corresponds to Theorem 15.31 
We now prove Theorem 15.41 

Proof. We construct Hq according to an iterative process. Initially, Hq = 0, and we define Cj = 
for all i G [d\. In each step of the process, we do the following: First, we select any i such that 
Cj < L and such that the collection {Xj[i] : j Hq} is not 5-regular. If there is no such i then we 
stop the whole process. Otherwise, we continue the step by choosing j G [n] \ Hq so as to maximize 
II -^i 2 • We then end the step by adding j into Hq and incrementing 

Note that the process must terminate with \Hq\ < dL; this is because each step increments one 
of ci, . . . , Cd, but no Cj can exceed L. When the process terminates, for each i we have either that 
{-Xj[i] : j Hq} is (5-regular or that = L. 

It suffices then to show that when Cj = L, the anti-concentration statement holds for i. To see 
this, first reorder the sequence of random variables (Xj[i])j so that the first |i?o| are i n the order 
that the indices were added to Hq, and the remaining n — \Hq\ are in an arbitrary order. Write 
1 < jo < ji < • • • < Jl-i < I -Ho I for the indices that were added to Hq on those steps which 
incremented c,. Then by the definition of the iterative process, for each < t < L we have that 
ll-^itbllla — ll-^i'bllll f° r au f > jt an< ^ that {Xj t [i], Xj t+ i [«'],•• • ,X n [i]} is not 5-regular. The 
anti-concentration statement now follows from Corollary 15.51 □ 
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6 The Meka-Zuckerman Generator 



For the Meka-Zuckerman generator, the first step is to reduce the problem of fooling functions of 
halfspaces under an arbitrary C-bounded product distribution to fooling an 0(C)-bounded discrete 
product distribution with support poly(n, C, e -1 ) in each co-ordinate. 

Lemma 6.1. Given a C-bounded distribution X, there is a discrete product distribution Y such 
that if f : M n — > { — 1,1} is a function of d halfspaces {hi : W 1 — > {— 1, ljjigu], then 



Each yi is distributed uniformly over a multiset = {b\(i) < ••• < b g (i)} where \bj(i)\ < 
{nC 2 e~ l )i. For every i, we have = 2 s = 0(n 2 C 2 e~ 2 ) and Further E[yj] = 0,E[y 2 ] = 



We are interested in d « n, so the error in going from X to Y is o(e 2 ). Since = 2 s for 
all i, sampling /c-wise independently from Y reduces to generating n strings of length s in a /c-wise 
independent manner: this can be done using k max(log n, s) = 0(k log(nC/e)) random bits. 

This lemma is proved by sandwiching X between two discrete product distributions Y u and 
Y e which are close to each other in statistical distance. The proof is in section EB Henceforth, we 
will rename Y as X and focus on fooling discrete product distributions. 

We now describe the main generator of Meka-Zuckerman, modified so that random variables 
take values in FJ ■ £lj instead of simply ±1. At a high level, the generator hashes variables into 
buckets and uses bounded independence for the variables within each bucket. We use a weaker 
property of hash functions than used in [MZ09]. 

The generator first picks a partition of [n] = Hi U . . . U Ht using a random element from T~L, 
a 1-collision preserving family of hash functions. For each i £ [t], it then generates a 5- wise 
independent distribution (yj)j £ H i on Ilje-ffi %• Such a distribution on n random variables can be 
generated using a seed of length A;logmax(n, These t distributions are chosen independently. 
The generator outputs Y = (yi, . . . ,y n ). The seedlength required is log(2n) + 5tlogmax(n, 
where log(2n) are required for the hash function and 51ogmax(n, bits are needed for each Hi, 



7 Analyzing the Meka-Zuckerman Generator 

We first prove that the indices in the set Hq are likely to be hashed into distinct buckets. 

Definition 7.1. A hash function h : [n] — > [t] is S'-isolating if for all x ^ y £ S, h(x) ^ h(y). A 
family of hash functions H = {h : [n] — > [t]} is (I, (3) -isolating if for any S C [n], \S\ < £, 



Lemma 7.2. Assume t is a power of 2. A b-collision preserving family of hash functions H = {h : 
[n] -»■ [t]} is (1, 0)- collision free for p = b£ 2 /(2t). 




1, E[yf] < 0(C). 



i e [t]. 




A 6-collision preserving hash family is likely to be isolating for small sets: 
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Proof. The expected number of collisions for a set S is at most 

\S\\b bf 



2 ) t ~ 2t 

By increasing n to the next largest power of 2, since t is a power of 2, there is a field F of size 
n where t\n. Then there is a hash family of size n for 6=1. For any element a G F, define a hash 
function h a (x) = (ax) mod t. Here x is viewed as a field element, the multiplication is done in the 
field, and the product is then viewed as a nonnegative integer less than n before taking the mod. 
We can increase n and t to be the nearest powers of 2. We can therefore take T~L to have size at 
most 2n for 6=1. □ 

We want the set Hq to be isolated with error e, so we want t = dL and f3 = e. Hence we set t 
to be the smallest power of 2 larger than £ 2 /e = (dL) 2 /e. 

We will aim to achieve error 0(de) (rather than O(e)), as this makes the notation easier. We 
set the parameters s and 5 in Theorem 15.41 as 

„4 8 

s = l/( V 2 Ve\ 5 " 



d 7 



This implies that 

L log(s)log(l/e) \ 1 =Q U 7 log 2 (er ] ) \ f = Q f(dLf \ = Q f d 15 logger?) 



r/ 8 / <5 \ 7y 12 e8 y ' y e y ^ ^24 £ i7 

7.1 Analysis for functions of regular halfspaces 

Recall that our goal is to fool functions of sgri(^ ■ XjWj — 9). Let Yj = VjWj and T = X^j=i ^j- 
Similarly let = XjWj and S 1 = X^?=i -^j'- Thus we are interested in bounding 

|Pr[5G^]-Pr[TGyl]| 

where A is a translate of union of orthants: membership of a point X G in A is a function of 
sgn(Jf — 0). By rescaling the Wj and 0, we may assume without loss of generality that 

n 

M[i, %] = ^ E [Xj [i] 2 } = 1 for all i G \d\. 

The regular case is when the vectors Wi, . . . , W n are such that for every i, the sequence of random 
variables {Xj[i]}™ =1 is (5-regular. In this case, we can directly appeal to the Berry-Esseen theorem 
to prove th correctness of the MZ generator. 

Theorem 7.3. If the sequence of random variables {JCj[i]}™ =1 is 5-regular for all i G [d], then the 
MZ generator O(de)-fools any function of sgn(iy • X — 0) for all G M d . 

Proof. We can therefore apply the machinery developed above. For the regular case, we only need 
to use 4- wise independence. Thus, the random variables Yi, . . . , Y n satisfy the 4-matching-moments 
condition with respect to X\, . . . , X n , as defined in Subsection 14.21 

The definition of 5-regular is given in Definition 15.11 Let <Tj j = \\Xj[i]\\2- Suppose that for all i, 
the set of real random variables {-X,[i]} is (5-regular, i.e., 

n n 2 

E iwiii* ^ *(£ win!) = 6 «i + ■■■+ <n? = s, 

3=1 3=1 
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where the last equality is from our normalization. We wish to apply Theorem 14.41 Since aij 
\\Xj[i]\\2 < H-XjHIU, we conclude that for all i, 

n n 

X>i<£||:«t<* 

3=1 3=1 

Since <r| = Yli=i a i,ji by Cauchy-Schwarz we get 



,i=l 



Therefore Y^=i a j — d 2 S. Hence we can apply Theorem 14.41 to obtain 

|Pr[S G A] - Pr[T G A]\ < O ((l/r?) 1 /^ 15 / 8 ) • (i" 1 + 5) 1 / 8 ) < 0(de). 
where the last inequality follows from the choice of t, 5. □ 
7.2 Analysis for functions of general halfspaces 

We now combine Theorem 15.41 with the analysis of the Regular case (Theorem 17. 3p . to prove that 
the MZ generator fools functions of arbitrary halfspaces. 

Theorem 7.4. The MZ generator O{de)-fools any function of d halfspaces with seed length 

' d 15 log 4 (ery) log {n/erj) 



0(t log(max(n,|fi|))) = O 
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Proof. Apply Theorem 15.41 with these parameters. Then there exists a set Hq C [n] of size at most 
dL such that the coordinates [d] can be partitioned into two sets, REG and JUNTA, such that the 
following holds. 

1. For i G REG, the set of real random variables : j S" Hq} is ^-regular. 

2. For i G JUNTA, for all 9 G R, 



Pr 



jeH 



< s 



j Mi 



We condition on the hash function h being S-collision free, which happens with probability at 
least 1 — e. Therefore, at most one variable from Hq lands in each set in the partition. Since the 
distribution in each partition set is 5-wise independent, this means that the distribution on Hq is 
fully independent. This allows us to construct a coupling of X and Y: let Xj = Yj for j G Hq, 
and then sample the rest according to the correct marginal distribution. 

We say that the variables in Hq are good if 



jeH 



£ Y^-9[i] >s- f°r alH G F 



3<£H 
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By Equation [TTl 



Pr[{Xj = Yj}j£H are not good] < 2de. 



(12) 



We condition on these variables being good. 

With this conditioning, we show that the halfspaces in JUNTA are nearly constant: with high 
probability they do not depend on the variables outside Hq. To see this, observe that conditioned 
on the variables in Hq, the remaining variables are still 4- wise independent (in both X and Y), so 
by Chebychev 



Pr 



I E n 



> S 



< 1/s 2 < e. 



(13) 



But if this does not happen, then 



si 



GW) = sign(V YM-eli})}- 



A similar analysis holds for X. Thus for both X and Y, with error probability at most 2d/s 2 < 2de, 
we can assume that the halfspaces in JUNTA are fixed to constant functions for a good choice of 
variables in Hq. 

Recall that we are interested in fooling functions of the form g{h\{X), . . . , hk{X)). Conditioned 
on the variables in Hq being good, the halfspaces hj for j G JUNTA are close to constant functions. 
Thus, the function g is 2de close to a function g' of halfspace {/ijjjeREG under both distributions 
X and Y. Thus it suffices to show that the bias of g' under X and Y is close. 

Conditioning on Xj = Yj for j G Hq gives a halfspace on the remaining variables in each 
coordinate i £ REG. Define 



E 



x^}, T'\i] = m- 



then 



sgn(S[i]-eW) = sgn(5 / [i]-e'[i]). 
Thus there exists a union of orthants A' S kI reg I such that g'(X) = 1 if X € A' . Our goal is to 



bound 



|Pr[S' G A'} - Pr[T' G A'] 



The set of random variables {Xj[i] : j Hq} is ^-regular. Hence we can apply our result 
for the regular case. We've already conditioned on the hash function h being i^o-collision free. 
Since this happens with probability at least 1 — e, the resulting function is 6-collision preserving for 
b = 1/(1 — e) < 2, since conditioning on an event which happens with probability p can increase 
the probability of any other event by a factor of at most 1/p. So now applying the analysis from 
the regular case, 



Pr[S' G A'] - PrfT' G A'} 



<0(V 1/2 d 15/ M-|:+5) 1/8 ) <0(de). 



Hence, conditioned on h and the variables in Hq being good, we have 



Pr[S G A] - Pr[T G A] 

S T 



< 0(de) + 2de. 



(14) 



(15) 
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Removing the conditioning gives 



Pr[5 G A] 

s L J 



Pr[T € 
T 



< 0(de) + 2de + e + 2de = 0{de) 



□ 



8 Generalized Monotone Trick 

We generalize the "monotone trick" introduced in Meka and Zuckerman [MZ09J and show that a 
generator that fools small-width "monotone" branching programs also fools any monotone function 
of several arbitrary-width monotone branching programs. 

First we define read-once branching programs. Branching programs corresponding to space S 
have width 2 s . We use the following notation from [MZ09]. 

Definition 8.1 (ROBP). An (S, D,T)- branching program B is a layered multi-graph with a layer 
for each < i < T and at most 2 s vertices (states) in each layer. The first layer has a single vertex 
vo and each vertex in the last layer is labeled with (rejecting) or 1 (accepting). For < i < T, 
a vertex v in layer i has at most 2 D outgoing edges each labeled with an element of {0, 1} D and 
pointing to a vertex in layer i + 1 . 

Let B be an (S, D, T)-branching program and v a vertex in layer i of B. We now define the set 
of accepting suffixes. 

Definition 8.2. We say z is an accepting suffix from vertex v if the path in B starting at v and 
following edges labeled according to z leads to an accepting state. We let Accb(v) denote the set of 
accepting suffixes from v. If B is understood we may abbreviate this Acc(v). 

Nisan [Nis92j and Impagliazzo et al. [INW94] gave PRGs that fool (S, D, T)-branching programs 
with error exp(2" n ( 5+D )) and seed length r = 0((S + D + log T) log T). For T = poly(S,D), 
the PRG of Nisan and Zuckerman [NZ96] fools (S, D, T)-branching programs with seed length 
r = 0(S + D). Meka and Zuckerman showed that the above PRGs in fact fool arbitrary width 
branching programs of a certain form called monotone, defined next. 

Definition 8.3 (Monotone ROBP). An (S, D,T) -branching program B is said to be monotone if 
for all < i < T, there exists an ordering {v\ -< x>2 ~< ■ ■ ■ -< of the vertices in layer i such that 
v -< w implies Accs(t>) C Accs(u>). 

Note that the natural ROBP accepting a halfspace, where states correspond to partial sums, is 
monotone. However, the natural ROBP accepting the intersection of just two halfspaces may not 
be monotone. 

The following theorem is the only known way to obtain PRGs for halfspaces using seed length 
which depends logarithmically on 1/e (and polylogarithmically on n). 

Theorem 8.4. \MZ09j Let < e < 1 and G : {0, 1} R -»• ({0, 1} D ) T be a PRG that 5-fools monotone 
(log(4T/e), D, T)-branching programs. Then G (e + 5) -fools monotone (S, D, T)-branching programs 
for arbitrary S with error at most e + 5. 

We now generalize Theorem [83] to the intersection of monotone branching programs, or even to 
any monotone function of monotone branching programs. (Of course, the intersection corresponds 
to the monotone function AND.) 
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Theorem 8.5. Let < e < 1 and G : {0,1} R -> ({0, 1} D ) T 6e a PRG that 5- fools monotone 
(dlog(4Td/e), D,T) -branching programs. Then G (e + 5) -fools any monotone function of d mono- 
tone (S, D,T) -branching programs for arbitrary S. 

We now generalize monotone functions to decision trees. First note that the complement of 
a monotone branching program is a monotone branching program. Now consider any decision 
tree, where each node of the decision tree is a monotone branching program. Any leaf of this tree 
represents the intersection of monotone branching programs. Thus, the error of the function above 
for such decision trees is at most s times the error for each leaf. This gives the following corollary. 

Corollary 8.6. Let < e < 1 and G : {0,1} R -> ({0, 1} D ) T be a PRG that 6 -fools monotone 
(dlog(4Td/e), D,T)- branching programs. Then G (s(e + 5)) -fools any decision tree with s leaves, 
where each decision tree node is a monotone (S, D,T) -branching programs for arbitrary S. 

In the above, we can even take s to be the minimum of the number of and 1 leaves. We 
now prove Theorem 18.5^ using the ideas of [MZQ9j based on "sandwiching" monotone branching 
programs between small- width branching programs. 

Definition 8.7. A pair of functions (/down) /up), each with the same domain and range as a function 
f : B — > {0, 1} ; is said to e-sandwich / if the following hold. 

1. For all z G B, / down (z) < f{z) < f up {z). 

2. Pr zeuB [f up (z) = 1] - Pr zeuB [f down (z) = 1] < e. 

The following lemma shows that it suffices to fool functions which sandwich the given target 
function. Bazzi [Baz09] used sandwiching in showing that polylog-wise independence fools DNF 
formulas. The lemma below is a small modification of a lemma in [MZ09 . 

Lemma 8.8. If (/down? /up) e-sandwich f, and a PRG G 8- fools /down an d /up, then G (e + 5)- 
fools f. 

Meka and Zuckerman then showed that any monotone branching program can be sandwiched 
between two small-width branching programs. 

Lemma 8.9. [MZ09] For any monotone (S, D,T) -branching program B, there exist monotone 
(log(4T/e), D, T)-branching programs (B down ,B u P) that e-sandwich B. 

Using this, we can show that any monotone function of monotone branching programs is sand- 
wiched by a small-width branching program. 

Lemma 8.10. Any monotone function ofd (S, D,T) -branching programs has a pair o/(dlog(4T/e), D, T)- 
branching programs (B down ,B u P) that (de)-sandwich it. 

Proof. For a monotone branching program B, let (B down ,B u P) denote monotone (log(4T/e), D, T)- 
branching programs that e-sandwich B, as given by Lemma 18.91 Suppose our given function is 
f(z) = g(Bi(z),B 2 (z), . . . ,B d (z)) for g monotone. Then f(z) is sandwiched by (/down, /up) given 
by 

/downW = f(B do ™(z),Bi°™(z),...,B d °™(zj) 
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Moreover, 



/up(l) - /doln(l) C U ((^T'U) " (^ d ° Wn )- 1 (l) / 
1=1 

Since Pr z [^ up (z) = 1] - Pr z [£,? own (z) = 1] < e, it follows that Pr,[/ up (z) = 1] - Pr,[/ down (z) = 
1] < de. □ 

Theorem 18. 51 now follows from Lemmas 18. 81 and 18. 101 Without using any of the hard work we've 
done in other sections, this theorem gives us PRGs for monotone functions of halfspaces (such as 
intersections) using a random seed of length 0(c?(log n) log(n/e)). We improve this seed length now. 

8.1 Combining the Monotone Trick and the main construction 

Fix a hash function h, which fixes the partition into t sets. Then any monotone function of 
sgn(yiWi + . . . + y n W n — O) may be computed by a monotone function of d monotone branching 
programs, with t layers each. Thus, we can apply Theorem 18.51 and Corollary 18.61 to deduce 
Theorem 11.21 

We can set T = t and D = O(logn) to store the seed for the 5-wise independent distribution. 
Also note that log r]^ 1 = 0(log C). With these parameters, using Nisan's PRG gives a seed length of 
0((dlog(cff/e) + L> + logT)logT) = 0((d log(Cd/e) + log n) log(Cd/e)) to fool monotone functions 
of d halfspaces. For functions computable by size s decision trees of halfspaces, the seed length 
becomes 0((dlog(Cds/e) + logn) log(Cds/e)). 

When Cd/e > log _c n for any c > 0, then t = polylog(n) and we can use the Nisan-Zuckerman 
PRG. This gives a seed length of 0(d log(cff/e) + D + log T) = 0(d \og(Cd/e) + log n) for monotone 
functions of d halfspaces. For functions computable by size s decision trees of halfspaces, the seed 
length becomes 0(d log (Cds/e) + logn). 

More generally, using Armoni's interpolation of Nisan and Nisan-Zuckerman will shave off an 
extra loglogn factor off of Nisan's PRG when t/e < exp(— (log n) 1-7 ) for some 7 > 0. We omit the 
details. 

9 Discretizing the distribution 

The first step is to truncate each Xi to lie in the range (-B, B). 

Lemma 9.1. Set B = (uCV 1 )?. For each i £ [n], let yi = £c, • I(|a5j| < B). Define the product 
random variable Y = (yi,y2, ■ ■ ■ ,2/n) where the yiS are independent. Then we have 

• SD(X,Y") < e. 

. E[y 2 2 ]>i,E[^]<C. 

Proof. Note that Xi = yi when \xi\ < B and y« = otherwise. But we have 

Pr[|a:i| > B} = Pr[|a;J 4 > B 4 } < — A = 

u 1 - J - 5 4 nC 

Thus it follows that 

SD(xi, yi ) < 4^ => SD(X,y) < 1 < e. 
nC C 



It is clear that E[y 4 ] < E[:c|] < C. Thus we only need to prove the claim about the two-norm. 



We have 

Xi = x { - I(\xi\ <B) + Xi- I(\xi\ >B) = yi + Xi- I(|x f | > B) 
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from which it follows that 

E[a;?] = E[y?]+E[ aJ ?-I(| aJi |>B)]. 
By the Cauchy-Schwartz inequality, we have 




Hence we have E[y?] > \. 

By a similar argument, one can show that | E[j/j]| < ^7. □ 

By suitable shifting and rescaling, we can assume that the distribution satisfies E[a?j] = 0, E[x?] = 
1, B[xf] < C and \xi\ < B . 

The next step is to suitably discretize the distribution. Assume that the random variable aij has 
a cumulative distibution function Ft where Fi(x) = Pr[xi < x\. Since \xt\ < B we have F(—B) = 
and F(B) = 1. We will define two sandwiching discrete distributions x\ and xf whose cdfs Ff and 
F^ satisfy: 

F l e (x)<F i (x)<F l e (x)+j 

Ft(x)- 1 <F i (x)<F l u (x) 

where 7 is a granularity paramater (which will be chosen as inverse polynomial in n). 

Let g = ^. Our goal is to define bucket boundaries bo,...,b g by picking bk that stisfy Fi(bk) = 

Definition 9.2. For k G {0, . . . , g}, let \ be the smallest x G [-B, B] so that Fi(x) > kj. 

We can sample Xj by first picking a bucket k G {0, . . . ,g — 1} and then sampling from this 
bucket according to the suitable conditional distribution, resulting in Xi G [6^,6^+1]. 
We now define the sandwiching distributions: 

Definition 9.3. The random variable x\ is uniformly distributed on {bo, ■ ■ ■ , b g -i} while xf the 
uniform distributed on {b\, . . . , b g }. We define the family T of2 n product distributions on W 1 where 
each co-ordinate is distributed independently according to xf or xf . 

It follows that SD(xf,xf) < 7. Hence if we take any pair of variables Y,Z from F, by the 
union bound we have SD(V, Z) < 771,. The following lemma allows us to reduce the problem of 
fooling halfspaces under the distribution X to the problem of fooling a single distribution from the 
family F. 

Lemma 9.4. Let h :R n — >■ {—1, 1} for i G [k] be a halfspace and let Y G F. Then 

\B[h(X)} - E[h(Y)]\ < 4 7 n. 

Proof. We will pick sandwiching distributions Y^ = (y[, . . . , y^) and Y u = (yf, . . . , y%) from F 
(depending on the halfspace h) and construct a coupling of the three distributions Y , X and Y u 
so that 

hiy') < h(X) < h{Y u ). (16) 
Let h(x) = sgn(^ i W{Xi — 6). If Wi > for all i, then we set 

Hi = x ii Hi = x i ■ 
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Whereas if Wi < 0, then we set 

Vi x i ) Vi x i • 

Next we describe the coupling, co-ordinate by co-ordinate. Fix co-ordinate i. Pick k € 
{0, ... ,g — 1} at random. Set x\ = bk and xf = bk+\. We now set the random variables yi,y%i 
and yf to be eiher x\ or xf, based on their defintion. We pick Xi conditioned on the k th bucket, 
so that bk < Xi < bk+i- It follows that 

WiVi < < ""Wi" 

and hence 

i i i 

which implies Equation 1161 

Since a halfspace is a statistical test, we have 

Pr[h(X) / h(Y u )] < Pr[h(Y e ) ^ h(Y u )] < SB(Y U , Y l ) < 7 n. (17) 

If we replace Y u with Y G J 7 , we have 

Pr[/i(X) / ft(Y)] < Pr[/i(X) / + Pv[h{Y) ^ h(Y u )}\ < 2 7 n 

where we use Equations [T7] and the fact that SD(Y ,Y U ) < 771. The claim follows since h(X) and 
h(Y) take values over { — 1, 1}. □ 

This lemma extends to fooling functions of halfspaces. 

Lemma 9.5. Let f : M. n — > { — 1,1} be a function of d halfsapces hi : M. n — > {—1,1} given by 
f = g(h\, . . . , hd) where g : {—1, l} fc — > {—1, 1}- Then for any Y G F, 

|E[/(X)]-E[/Cn]|<4 7 dn. 

Proof. We consider the same coupling used in Lemma 19.41 We have 

Pv[g{X) £ g(Y)] < Pr^X), . . . , h d (X)) ^ (h^Y), h d {Y))\ < £ Pr[^(X) + h^Y)) < 2 7 dn. 

i 

The claim now follows since g is Boolean valued. □ 

Finally, we need to show that for a suitable choice of 7, the expectation and the second and 
fourth moments of xf and xf are nearly the same as those of Xi. We prove the claim for the fourth 
moment, the other arguments are similar. 

Lemma 9.6. We have 

I E[(xrf] - E[«) 4 ]| < 2B\ I E[(«i) 4 ] - E[(^) 4 ]| < 2i? 4 7 
Proof. It is clear that 

9— l 9 
E[(^) 4 ]=7£^), E[«) 4 ] = 7 (1>*)- 

fe=0 k=l 
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Our goal is to compare these with the 4 moment of £Cj. The contribution of the k bucket to 
E[;r 4 ] can be upper bounded by 7max(6^, and lower bounded by 7 min(6^, Hence 

9-1 9-1 
7^min(&t&fc+i) <E[a£] <7^max(6|,6| +1 ). 

k=0 k=0 

By case analysis, the sequence max(6^, b\ +1 ) takes on g distinct values from {60, . . . , b g }. Simi- 
larly, min(6^, bj, , -J can take some value twice but every other value at most once. Hence both the 
upper and lower bounds are within 2i? 4 7 of both E[(a^) 4 ] and E[(a;") 4 ]. □ 

A similar argument shows that the second moment changes by at most 2B 2 j and the expectation 
by 2i?7. We pick 7 < 2 nB i = ®( n 2 c' 2 ^ which is of the form 2~ s for some integer s. We have 
2 s < OiJ^T-) hence s = log(n 2 C 2 /e 2 ) + 0(1). To sample from x\ (Xf), we pick a random bit- 
string of length s, treat it as a number j G {0, g — 1}, and output bj (bj + i). 

Finally we rescale and shift, so that we again have E[j/j] = 0,~E[yf = 1] and E[y 4 ] < C. 

10 Bounded Independence fools functions of halfspaces 

In this section, we prove Theorem 11.51 

10.1 Reduction to upper polynomials for single halfspaces 

We now flesh out the reduction described in Section [21 i.e., we show how to prove Theorem 12.51 
given upper sandwiching polynomials for a single halfspace with extra properties. 

Lemma 10.1. Let X be a random vector on the product set £1, and suppose we have order-k 
polynomials p%, . . . ,Pd : f2 — >■ R, as well as functions h\, . . . , : O, — > {0, 1}. Write p = Pi(X) and 
hi = hi{X). Assume that for each i E [k]: 

1. p > hi with probability 1; 

2. E[p - hi] < e ; 

3. Pr[p> 1 + 1/d 2 ] < 7 ; 

I \\phd< 1 + 2/d 2 . 

If we write p = P1P2 ■ ■ ■ Pd, h = h\}i2 ■ ■ ■ hd, thenp is a polynomial of order at most dk, p(X) > h(X) 
with probability 1, and 

B\p(X) - h(X)] < 2de + M 2 ^. (18) 

Proof. The first two parts of the claim are immediate, so it suffices to verify (|18p . We use the 
telescoping sum ([T]), and thus it suffices to bound the general term as follows: 

E[fci • • • hi-i(p - hi)p i+1 ■■■p d ]< 2e + 3d^. (19) 

We have 





E[hi--- 


hi-i(pi - hi)p i+1 ■■■pd] 




< 


E[pi • • -Pi-x(jpi - hi)p i+l ■■■pd] 




< 


2E[pi- 


hi] + E[l[p! • • • Pi-iPi+i - Pd 


> 2] pi ■ ■ ■ Pi-i(pi - hi)p i+l ■■■pd] 


< 


2e + E 


i[^>i + i/d 2 ]) t[ Pl 

\i'=l J i=l _ 


5 
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where in the last term we used the bounds (1 + l/d 2 ) d 1 < 2 and pi — hi < p t . Thus we can 
establish (|19p by showing the bound 



i'=i 



]Te i[ Pi , > i + i/d 2 ] f[p 

i=l 



This follows by bounding each summand individually: 



E 



> i + i/d 2 } Y\Pi 



i=l 



< 



\\l[pi> > 1 + l/d 2 ] || 2 • Y[ \\pihd (Holder's inequality) 



< ^7-(l + 2/d 2 ) d < 3^7 



as needed. 



□ 



10.2 Tools for upper polynomials 



We construct the upper sandwiching polynomial needed in Lemma 110.11 using two key tools: 
"DGJSV Polynomials", the family of univariate real polynomial constructed in DGJ + 09] for ap- 
proximating the sgn function; and, our Regularity Lemma for halfspaces over general random 
variables l5l 



Regarding the DGJSV Polynomials, the following is a key theorem from DGJ + 09] (slightly 
adjusted for our purposes): 



Theorem 10.2. (fDGJ + 0$ ) Let < a, b < 1. Then there exists an even integer K = K a & with 

log(2/6) 



K < Ci 



Q- 



(Cq is a universal constant) 



as well as an ordinary univariate real polynomial P = P a ^ 
behavior: 



of degree K with the following 



• P{x) > for x G (— oo, —1], 

• < P{x) <bforx£ [-1, -a); 

• < P(x) < 1 for x G [-o,0]; 

• 1 < P(x) < 1 + 6 for x G [0,1]; 

• P(x) > 1 for x G [l,oo); 

• P(x) < (4x) K for all \x\ > 1. 

Note that the first five conditions imply P(x) > l[x > 0] for all 

Regarding our Regularity Lemma for general halfspaces, we will use the following rephrasing of 
Theorem 15.31 with simplified parameters: 
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Theorem 10.3. Let t > 1, < <5 < 1 and < r\ be "parameters. Then there exists an integer L 
satisfying 

L < poly (log t, l/ri) • - 

such that the following holds. Suppose x\, . . . , x n is a sequence of independent n-HC random vari- 
ables, 9 £M, and n > L. Then there exists a set of coordinates H C [n] of cardinality L such that, 
denoting 

e' = e-^xj, z = J2 x j 

(these random variables are independent), we have three mutually exclusive and collectively exhaus- 
tive events depending only on 6' : 

• Event BAD: \0'\ < t||z||2 and the collection {xj : j H} is not 5-regular; 

• Event NEAR: \9'\ < t\\z\\2 and the collection {xj : j H} is 5-regular; 

• Event FAR: \G'\ > t\\z\\ 2 . 

Furthermore, BAD has probability at most 0(l/t ). 

The reader will note that events BAD, NEAR, and FAR are defined somewhat peculiarly: 
Neither ||z||2 nor the (ir)regularity of {xj : j H} is actually random. Furthermore, by our original 
Theorem 15.31 we either have that {xj : j H} is 5-regular, in which case NEAR and FAR are 
the only possible events, or the collection is not 5-regular, in which case BAD and FAR are the 
only possible events. Nevertheless, this tripartition of events makes our future analysis simpler. 



10.3 Statement of the main technical theorem, and how it completes the proof 

The main technical result we will prove is the following: 

Theorem 10.4. Let k > 1, < S < 1, and t > 4 be parameters. Let X = (xi, . . . ,x n ) be a vector 
of independent r/-HC random variables. Furthermore, let T be an even integer such that the x, L 's 
are (T, 2, A/t)-hypercontractive. Assume T > C\d\og{dt), where C\ is a universal constant. Let 
9 G M and let 

h(xi, ... ,x n ) = l[xx H h x n - 9 > 0] . 

Then there exists a polynomial p(xi, . . . , x n ) of order k, with 

fc<poly(logM/77)-- + 0(T/d), 
satisfying the 4 properties appearing in Lemma \ 10.1[ with 

e „=o(^) + o (ei ), C1= ^m, 7=2 -/«. 

As we now show, using Theorem 110.41 and Lemma 110.11 we can deduce Theorem 12.51 and hence 
Theorem 11.51 simply by choosing parameters appropriately. Note that it is sufficient to prove The- 
orem [23] with e • polylog((i/e) in place of e. 

We will apply Theorem 110.41 with 5 = @(e 2 /d 2 ) and 

d 2 

t = c 2 -, 

ea 
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where C% is a large constant of our choosing. Regarding the hypercontractivity parameters, using 
Fact 13.31 we may take 

r ] = e{a- 1 ^), T = Q(t 2 ■ aln(2/a)). 
The necessary assumption that 

t > Cidio g (td) & cf • e ( dHn W a A > Cldlog ( c 2 ^-\ 

\ e z a J \ ea J 

is valid provided that C2 is a sufficiently large constant. 

We obtain from the theorem an upper e2-sandwiching polynomial for h with order 
k = 0{d 2 /e 2 ) • poly(l/a) + 0(d 3 /e 2 ) ■ poly(l/a) < 0{d 3 /e 2 ) ■ poly(l/a), 

where 

e = O(efd) + 0(e/d) = 0(e/d) 

and 7 is exponentially small in d/(ea). By using such polynomials in Lemma 110.1} we get upper 
sandwiching polynomials for intersections of d halfspaces with the claimed degree kd = 0(d 4 /e 2 ) ■ 
poly(l/a) and the claimed error deo = e ■ polylog(d/e). 

10.4 Proof of Theorem [T0T41 

In this section, we prove Theorem 110.41 Let H be the set of cardinality L = poly(logt, I/77) • (1/5) 
coming from Theorem 110.31 and assume without loss of generality that H = {1, . . . , L}. We use 
the notation 9' = 6 — (x± + • • • + xl), z = xl+i + • • • + x n , BAD = BAD(xi, . . . , x£) etc., with 
boldface indicating randomness as usual. Given the outcomes for x\, . . . ,xl, we will handle the 
three events BAD, NEAR, and FAR with separate ordinary real polynomials. More precisely, 
our final (generalized) polynomial will be 

p( Xl , ...,x n ) = 1[BAD] • 1 + 1 [NEAR] • p£, oar (z) + 1 [FAR] • p f e f(z), 

where 




and 

P p(z) = i[9'>o]-i + i[e'<o)-^y, 

where q is a positive integer and P is an ordinary real univariate polynomial to be specified later. 
For typographic simplicity, we will write simply pgi in place of pg? av and pp r , with context dictating 
which we are referring to. 

Let us walk through the properties of p we need to prove. Regarding its order, we will prove 
that both 

q<0(T/d), degP < 0(T/d); 

i.e. when 6' is fixed, pe>(xL+\, ■ ■ ■ ,x n ) has degree at most 0(T/d) as an ordinary multivariate real 
polynomial. Since 0' , BAD, NEAR, and FAR are determined by xx,...,xl alone, it follows that 
our final polynomial p is a generalized polynomial of order at most L + 0(T/d), as needed for the 
theorem. 
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Next, we discuss Condition 1, that p(X) > h(X) always. For the BAD outcomes for x%, . . . , xl 
we have p(X) = 1 > h(X). For the remaining outcomes, we will have p(X) > h(X) as required 
provided that in all cases 

P6>{z)>ho>{z) for all & and z (20) 

where 

h e >(z) = l[z- 0' > 0] . 

Next, we discuss Condition 2, the bound E[p(X) — h(X)] < e\. It suffices to prove an upper 
bound of 0(ei). Recall that 

dtlog(dt) 
=i = — T— • 

Note also that we will always T < t 2 , since no random variable has stronger hypercontractivity 
than do Gaussians, for which T < 1 + £ 2 /16. It follows that we will always have ei > 1/t. Thus 
the probability of BAD, which is at most 0(l/t ), is much smaller than 0{e\) and can therefore 
be neglected. Hence it suffices to show that 

E\p e ,(z)- hg,(z)] <0(ei) (21) 

holds in both of the following cases: 

Case Near: \9'\ < ^ 1 1 -2; 1 1 2 and the collection {xl+i, ■ ■ ■ ,x n } is 5-regular. 
Case Far: \6'\ > t\\z\\ 2 . 

Next we discuss Condition 3, the bound Pr[p(X) > 1 + 1/d 2 } < 2 
for the bad outcomes xi, . . . ,xl, it suffices to show that 

Pr[p0,(z) > 1 + 1/d 2 ] < 2- Tld 

holds in both Case a and Case b. 

Finally, we discuss the bound ||^?(-XT) || 2d < 1 + 2/d 2 . We have 
E[p(X) 2k ] < (l+l/d 2 ) 2d +B[p(X) 2d -l[p(X) > 1 + 1/d 2 }} < l+3/d+B[p(X) 2d -l[p(X) > 1 + 1/d 2 ]]. 
If we can show that 

E[p(X) 2d .l[p(X)>l + l/ ( l 2 ]]<lM 

then we will have shown 

B\p(X) 2d ] < 1 + 4/d < (1 + 2/d 2 ) 2d , 

as required. Thus it remains to establish the previous upper bound. Again, since p(X) = 1 for the 
BAD outcomes xi, ■ ■ ■ , xl, it suffices to show that 

E\pg, (z) 2d ■ 1 [p g , (z) > 1 + 1/d 2 ] ] < 1/d (23) 

holds in both Case Near and Case Far. 

Summarizing, our goal is to construct univariate polynomials po'(z) of degree at most 0(T/d) 
for each of Case Near and Case Far so that ([20]) . pi]) . (1^2]) . and ([23} all hold. We will first handle 
Case Near, the more difficult case. 



T / d . Again, since p(X) = 1 

(22) 
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10.4.1 Case Near 



In this case we have \9'\ < * 1 1 ^ 1 1 2 5 where z = + • • • + x n is the sum of a <5-regular collection 

of independent random variables. Our task is to construct a real polynomial po'(z) of degree at 
most 0{T/d) such that bounds ([20]). (f21j) . (|2"2|) , and {53]) all hold with respect to the function 
h e >{z) = l[z - 9' > 0]. 



Given the parameters d and t, choose 

dlogjtd) 
a = lbCn — • 



min(l/d 2 ,l/i 4 ); 



we have a < 1 assuming that the Ci in our assumption on T is large enough. Let K = K a ^ and 
P = -fa, 6 be the resulting even integer and univariate polynomial from Theorem 110.21 Our choice 
of a was arranged so that 

K < —.. (24) 



4d 



We will define 



pe'(z) = Pnea,r{0' , z) = P(w), where w 



Thus pe'(z) has degree K = 0(T/d) as necessary, and it also satisfies ([20]). using the property 
that P > on (-00, 0] and P > 1 on [0, 00). 
Next we check (|23j) . i.e., 

E[p e ,(z) M • l[p e ,(z) > 1 + 1/d 2 ]] < 1/d. 

Since 6 < 1/d 2 , we have that pe>(z) > 1 + 1/d 2 only if \w\ > 1. 

Also notice that po'{z) < (4w) K , it suffice to bound E[l[|w| > 1] • (4w) 2dii "] and we will prove a 
stronger result: 

E[(4w) MK • 1[H > 1]] < 2~ T . (25) 

To see this, since we are in Case Near we have \9'\ < t\\z\\2- Thus if \w\ > 1, we must have 
|z| > ^11-^11 2 - This also implies \z — 9'\ < 2 \z\; hence we have 



|4u>| 



t\\zh 



9'\ 4 
< - 



* 2 



Thus we have 



E 

< E 



l[\w\ > 1] • (4te) 
l[N>*Ha] 



2dK 



E 



P 2 



2dX 



> t 



Z \ 



2dK 



Z\\2j 

Z 
W2 



2dK 



(26) 



It is easy to check that 



z 2 



> t 



z 2 



2dK 



< 



t\\z\l 



f 2dK 
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using the fact that 2dK < T. Thus we may upper-bound ([26]) by 



ftdK t -T ll z llr < <fi dK t~ T (t / A) T = 4 2c/A "-' r 



where we used the (T, 2, 4/t)-hypercontractivity of z. Since we have 

2dK < T/2, (27) 

by virtue of (|24|). we conclude 

E[ Pe ,(z) 2d ■ l[ Pe >(z) > 1 + 1/d 2 ]] < A~ T ' 2 = 2~ T < 1/d. (28) 

Let us move on to showing (j22|) in this Case Near; i.e., upper-bounding Pr[pg>(z) > 1 + 1/d 2 ]. 
Since b < 1/d 2 , again we have that po>(z) > 1/d 2 only if |iu| > 1. But by (]25]) 

E[l[|u;| > 1] • (4w) dK ] < 2~ T , 

and the left-hand side is clearly an upper bound on Pr[|io| > 1]. Thus we have established (j22[) in 
Case Near. 

Last, we will work to upper bound E[pg/(z) — hg/(z)] so as to show (f2Tj) in Case Near. We 
analyze three subcases, depending on the magnitude of w. 

Case i: — a < w < 0. In this case, we upper-bound pg/(z) — hgi(z) simply by 1, and argue that 
Case i occurs with low probability. Specifically, 

Pr[-a < w < 0] < Pr[\w\ < a] = Pr[\z - 0'\ < 2ta ■ \\z\\ 2 ). 

We can upper-bound this probability using the Berry-Esseen Theorem \IZ()!). Corollary 4.5]. Since 
we have 5-regularity of £Cl+i, . . . , x n in Case Near, we get 

Pv[\z - e'\ < 2ta ■ \\z\\ 2 ) < 0(VS + ta) 

By definition of a we have 0(ta) = 0{e\). Thus we conclude for Case i, 

E[l[Casc i] • (p ,(z) - h ,(z))} < 0(V6 + e x ). (29) 

Case ii: \w\ < 1 but not Case i. In this case, we have po>(z) — hg/(z) < b < 1/t 4 , by 
construction. Thus 

E[l[Case ii] • {p e >{z) - h e ,{z))) < 1/t 4 < 0{e x ). (30) 
Case iii: \w\ > 1. I.e., \z — 9'\ > 2t\\z\\2- Notice that pg'(z) — hgi{z) < pe>(z) and therefore 



E[l[Case iii] • (p g >(z) - h e >(z))} < B[p ,(z) ■ l[\w\ > 1]] < E l[\w\ > 1] (iw) dK < 2' 1 < 0(ei 
(the second last inequality is due to ([25]) ). 
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10.4.2 Case Far 



If 6 < then hgi is almost always 1. As stated, in this case we simply have pe'(z) = 1. Bounds (|2(jp . 
(|22p . and (|23p become trivial; for (|2ip it suffices to show 

Pr[z < 0'] < ei. (31) 

We will show a stronger statement in the course of handling the case that 6' > 0. 

So it remains to handle the 6' > case. As stated, in this case we define 

Pfl'(z) = Pfar(0,3) = J , 

where 

' T 
2d 

meaning T/2d rounded down to the nearest even integer. Note that po'(z) has the claimed degree 
bound 0{T/d) (treating 6' as a constant). Also note that po'(z) > 1 if and only if \z\ > 9'. This 
establishes (EDI. 



Let's move to ([22]) ; we need 



Certainly 



Pr[p e ,(z) > 1 + 1/d 2 ] < 2 



-T/d 



Pfl /(Z) > 1 + 1/d 2 
It thus suffices to show 



P8'(z) > 1 



z > 



Pr[\z\ > \9'\] < 2~ T / d , 

which, once shown, also establishes ([3T]) . since 2~ T l d <C ej. We will in fact show the stronger 
statement 



E 



\9> 



< 2- T ' d . 



And this stronger statement establishes (|2ip . again because 2 T l d < e%. 
To prove (f32l) we appeal to the condition of Case Far, \9'\ > i||z||2- Thus 



E 



< E 



< E 



IA*Mls 



* 2 



9/r 



(Jensen, since T/q >1) 



-, /M NI? x9/T 



4-9 



(by (T, 2, 4/t)-hypercontractivity of z) 

-T/d 



(32) 



using the definition of q. 
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Finally, to prove (|23[) it certainly suffices to show 

l/d>E[p e ,(z) 2d ] = E 

By repeating the previous inequality with 2d in place of q (we still have T/2d > 1), we can upper- 
bound the expectation by 4 _2rf , which is indeed at most l/d. This concludes the verification of 
Case Far, and thus all of Theorem 110.41 




11 Fooling the uniform distribution on the sphere 

In this section, we will show that our PRG can also be used to fool any function of d halfspaces 
over the uniform distribution on the n dimensional unit sphere; building such a PRG also has an 
application in derandomizing the hardness of learning reduction in [KS08 . 

The main idea is to show that the n dimensional Gaussian distribution can be use to fool the 
uniform distribution on the sphere. Therefore, it suffice to fool the n dimensional Gaussian which is 
studied in the previous sections (either using the modified MZ generator or fc-wise independence). 

Specifically, we first show the following connection between the n dimensional Gaussian distri- 
bution A^(0, l/y/n) n and the uniform distribution on the n dimensional unit sphere S n -\. 

Lemma 11.1. For any 9 1 ,9 2 , ..0 d G M and W u W 2 , -W d G W 1 and hi(X) = sgn(Wj ■ X - 6>j) and 
f : {0, l} d {0, 1}, there is some universal constant C such that 

| E \f(h l (X),..,h d (X))] - E [f(h 1 (X),h 2 (X)..h d (X)]\ < ( 33 ) 

Proof. Notice that if we choose x £ u A/*(0, 1/v/n)™, then uAr- follows the uniform distribution on 

1 1 X 1 1 2 

the sphere. Therefore, we only need to bound: 

I E (/(^( * ),..,fc d ( * ))_ E f(h 1 (X),h 2 (X)..h d (X))\ 

Xe u M(0,l/^/n) n \\X\\2 \\X\\2 xeuAf{0,l/y/n) n 

< Pr {f(h 1 {^),..,h d (^))^f(h 1 (X)MX).MX))) 

xe u Af(0,l/^E) n \\X\\2 \\X\\2 

f fr M^r)^h i {X)) (34) 

By Lemma 6.2 in [MZ09J, we know that: 

„ /, / X . . , , Clogn 

xe„Af(o,i/v/i)™ \\X\\2 n 1 / 4 

Combining above inequality with (j34"j) . we prove ([33]) . □ 

Therefore to fool any function of d halfspaces over the uniform distribution on the n dimensional 
sphere with accuracy Q( C ^°f 4 n ), it suffice to build a PRG for n dimensional Gaussian distribution 
with the same accuracy. 
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11.1 Derandomized hardness of learning intersections of halfspaces 

One of the application of above PRG is that we can use it to derandomize the hardness of learning 
result in [KS08]. In [KS08], Khot and Saket showed that assuming NP^RP, for any e > and 
positive integer d, given a set of examples such that there is a intersection of two halfspaces that 
is consistent with all the examples, it is NP-hard to find a function of any d halfspaces that is 
consistent with a 1/2 + 0(e) fraction of the examples. Our PRGs can be used to derandomize the 
hardness reduction and obtain the same hardness result assuming NP^ P. 

To see why our PRG works, we need to look into the details of [KS08] , Let us explain in 
high level why our PRG helps, without entering into the details of the reduction. The hardness of 
learning result in [KS08] is based on a reduction from a Label Cover instance £ to a distribution Dq 
on negative examples and a distribution T>\ on positive examples. Such a reduction would preserve 
the following two properties: 

• (Completeness) if the optimum value of £ is 1, then there is a intersection of two halfspaces 
f(x) that agrees with all the examples; i.e., Ed 1 [/(X)] = Ed [/(X)] + 1. 

• (Soundness) if the optimum value of C is small, then for any h(x) which is a function of d 
halfspaces, we have that |Ed [/i(X)] — [/i(Jf)] | = 0(e) which implies that h(x) agrees 
with at most 1/2 + 0(e) fraction of the examples. 

The Di for (i = 0, 1) constructed in [KS08J is a mixture of uniform distribution on the sphere 
located at different center and the number of the different spheres is poly(n), where n is the size 
of the Label Cover instance. Then by the PRG in this paper, we can derandomize each sphere 
with some distribution that only has support of size poly(ra) to e-fool functions of d halfspaces; and 
overall we can get distribution Vq and V\ with poly(n) support and it has the property that for any 
function h(x) of / halfspaces, | [f(X)]— E-p t [/(X)]| < 0(e) for i = 0,1. If we replace T>i with V% in 
the hardness reduction, we still get the soundness guarantee that | Ep 1 [f(X)] — Ep [/(X)]| = 0(e). 

We also need to verify that the completeness property will hold if we replace T>i with Vi- If we 
look into the reduction of [KS08 , as long as the distribution Vi has all its support points on the 
sphere, the reduction will preserve the completeness property. Therefore, to make the reduction 
work, we need to build a PRG for functions of d-halfspaces over the uniform distribution on the 
sphere with the additional property that all the points generated by the PRG are all on the unit 
sphere as well. 

This is also achievable and we summarize the high level idea here. As is shown in Lemma QTTTJ 
it suffice to fool functions of d halfspaces over n dimensional Gaussian instead of the uniform 
distribution on the sphere. In addition, by the proof of Theorem 14.41 if we only want to fool any 
functions of d e-regular halfspaces, it suffice just to fool uniform distribution on {—1/y/n, 1/ yfn} n 
instead. For the uniform distribution over {—l/^/n,l/y/n} n . we know that it can be fooled by 
PRG with all the support points in {— 1/y/n, l/y/n} n which is a subset of the unit sphere. To 
handle the case that d halfspaces are not all e-regular, we can follow the idea of [MZQ9] Lemma 6.3 
by showing that there exists a set of poly(re) unitary rotations and with high probability that all 
of the d halfspaces become regular under a rotation randomly chosen from the set. 
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